This is an R Markdown notebook which combines text, code, and code outputs into one document. This notebook was created to demonstrate a few features of the COINr package, for the COIN week training 2021. It is not meant to thoroughly describe each step, but simply gives a record of the commands used in the demo.
Assuming you have R (and ideally R Studio) already installed, COINr can be installed in either of two ways.
The official CRAN version can be installed by running:
install.packages("COINr")
Or simply browsing for the package in R Studio. The CRAN version will be updated every 1-2 months or so, and has passed all official CRAN checks (there are many).
If you want the very latest version of COINr (I am usually adding features and fixing bugs as I find them), you can install the development version from GitHub. First, install the ‘devtools’ package if you don’t already have it, then run:
devtools::install_github("bluefoxr/COINr")
This should directly install the package from Github, without any other steps. You may be asked to update packages. This might not be strictly necessary, so you can also try skipping this step.
To build a composite indicator in COINr, you will follow these steps:
assemble() functionThe first step can be done in any number of ways. If your data is mainly in Excel, you can read it in using R Studio’s “Import Dataset” tool, which uses the readxl library. You can also read from csv, or directly download your data from R from data sources such as Eurostat using e.g. the ‘eurostat’ package. R has many API interfaces.
The second step is linked to the first. COINr requires three data frames as its main inputs. These give the indicator data, metadata, and the structure of the index, among other things. This means that besides importing the data, you also have to adjust it to the correct format. Note that this step can also be done before importing the data into R, for example, you may prefer to assemble the tables in Excel (or another tool) first. In this case, steps 1 and 2 would be reversed.
To understand the format needed for COINr, the easiest way is to look at the built in example data set in the COINr package. Please also see the COINr vignette and the online book where explanations are given in detail. Here, we simply view each of the three input data frames required.
The first data frame, called IndData, specifies the indicator values, for each country (or more generally, for each unit):
library(COINr)
##
## Attaching package: 'COINr'
## The following object is masked from 'package:stats':
##
## aggregate
ASEMIndData
The second, IndMeta gives the metadata for each indicator. Here, the index structure and indicator weights are also defined.
ASEMIndMeta
Finally, the AggMeta data frame gives some details about the names and weights of each aggregation group.
ASEMAggMeta
You can also explore these data frames using View() in R Studio.
Having your three data frames in hand, you now assemble the COIN.
ASEM <- assemble(IndData = ASEMIndData,
IndMeta = ASEMIndMeta,
AggMeta = ASEMAggMeta)
## -----------------
## Denominators detected - stored in .$Input$Denominators
## -----------------
## -----------------
## Indicator codes cross-checked and OK.
## -----------------
## Number of indicators = 49
## Number of units = 51
## Number of aggregation levels = 3 above indicator level.
## -----------------
## Aggregation level 1 with 8 aggregate groups: Physical, ConEcFin, Political, Instit, P2P, Environ, Social, SusEcFin
## Cross-check between metadata and framework = OK.
## Aggregation level 2 with 2 aggregate groups: Conn, Sust
## Cross-check between metadata and framework = OK.
## Aggregation level 3 with 1 aggregate groups: Index
## Cross-check between metadata and framework = OK.
## -----------------
COINr returns some details about the new COIN and runs a number of checks to make sure the supplied data frames obey the rules. The idea is that if it is possible to assemble a COIN, from this point onwards things should be fairly straightforward.
Examine the COIN by running View(ASEM) in R Studio. Notice that we have one data set, called “Raw”.
Now we can plot the structure of the index.
plotframework(ASEM)
We can also check indicator statistics.
ASEM <- getStats(ASEM, dset = "Raw")
## Number of collinear indicators = 3
## Number of signficant negative indicator correlations = 322
## Number of indicators with high denominator correlations = 7
ASEM$Analysis$Raw$StatTable |> roundDF()
Statistics can be added to the COIN or output as a separate list of data frames. The getStats() function also outputs correlation tables.
We can view distributions of individual indicators, or groups of indicators.
# A single indicator
plotIndDist(ASEM, dset = "Raw", icodes = "Goods", type = "Histogram")
# A named group of indicators
plotIndDist(ASEM, dset = "Raw", icodes = "Political", type = "Violindot")
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
We can also see ordered indicator values, in this case, from a group of units (Europe).
iplotBar(ASEM, dset = "Raw", isel = "CO2", aglev = 1, from_group = list(Group_EurAsia = "Europe"))
We can now build the index. We will impute, treat outliers, normalise and aggregate. First the imputation.
# impute missing values with GDP group median
ASEM <- impute(ASEM, dset = "Raw", imtype = "indgroup_median", groupvar = "Group_GDP")
## Missing data points detected = 63
## Missing data points imputed = 63, using method = indgroup_median
Now we treat any outliers from the imputed data using a standard Winsorisation and log transform approach.
ASEM <- treat(ASEM, dset = "Imputed", winmax = 5)
We can see what indicators were treated using information stored in the COIN.
ASEM$Analysis$Treated$TreatSummary
We can also visualise and compare before/after distributions using a built in app.
# only run in interactive R session
indDash(ASEM)
Now we will normalise the treated indicators using a min-max approach between 1 and 100.
ASEM <- normalise(ASEM, dset = "Treated", ntype = "minmax", npara = list(minmax = c(1,100)))
Finally we will aggregate the normalised data according to the structure already specified in IndMeta. We will use an arithmetic mean for the first and second levels of aggregation (indicators to pillars and pillars to sub-indexes), and geometric mean for the last (sub-indexes to index). Weights are already input in IndMeta.
ASEM <- aggregate(ASEM, agtype = "mixed", dset = "Normalised",
agtype_bylevel = c("arith_mean", "arith_mean", "geom_mean"))
Now let’s see the results. First, a simple table.
# get a results table and write to COIN
ASEM <- getResults(ASEM, tab_type = "Aggregates", out2 = "COIN")
# display results
ASEM$Results$AggregatesScores
We write to the COIN because later this will all be exported in one go. Now we can explore the results in the app.
# only run in interactive R session
resultsDash(ASEM)
All plots available in the app can be also accessed individually. We can see a map of overall scores.
iplotMap(ASEM, dset = "Aggregated", isel = "Index")
Finally, we export everything to Excel.
coin2Excel(ASEM, fname = "ASEM_demo_results.xlsx")
COINr has sophisticated correlation plotting which also accounts for the hierarchical structure of the index.
# sustainability indicators
plotCorr(ASEM, dset = "Normalised", icodes = "Sust", showvals = F, flagcolours = T, grouplev = 0,
box_level = 2)
We can plot aggregates against indicators, in fact correlate more or less anything against everything.
plotCorr(ASEM, dset = "Aggregated", aglevs = c(1,2), box_level = 2, withparent = "none", box_colour = "black")
We can also see the correlations of indicators or aggregates with all parent levels.
plotCorr(ASEM, dset = "Aggregated", aglevs = 1, icodes = "Sust", withparent = "family", flagcolours = T)
If you wish to change weights and see the effects on correlations, you can use the built in re-weighting app.
# only in interactive R session
rew8r(ASEM)
A major advantage of working with COINr is that making alternative indexes and adjustments is very easy. It is done by editing the .$Method folder in the COIN and then calling regen().
# Make a copy
ASEMAltNorm <- ASEM
# Edit .$Method
ASEMAltNorm$Method$normalise$ntype <- "borda"
# Regenerate
ASEMAltNorm <- regen(ASEMAltNorm, quietly = TRUE)
Next, the two alternative COINs can be compared using compTable().
compTable(ASEM, ASEMAltNorm, dset = "Aggregated",
isel = "Index") |>
head(10)
This can be applied to any methodology in the COIN.
To specify a sensitivity analysis, needs to know which parameters to perturb, and what the alternative values should be. This is done by creating a named list which is input as an argument to sensitivity(). The following gives an example of this list, perturbing three assumptions (imputation method, normalisation method and weights).
# define noise to be applied to weights
nspecs <- data.frame(AgLevel = c(2,3), NoiseFactor = c(0.25,0.25))
# create list specifying assumptions to vary and alternatives
SAspecs <- list(
impute = list(imtype = c("indgroup_mean", "ind_mean", "none")),
normalise = list(ntype = c("minmax", "rank", "dist2max")),
weights = list(NoiseSpecs = nspecs, Nominal = "Original")
)
Run the sensitivity analysis (takes a few mins)….
# This will take a few minutes to run
SAresults <- sensitivity(ASEM, v_targ = "Index",
SA_specs = SAspecs,
N = 500,
SA_type = "SA", Nboot = 1000)
These outputs can be inspected directly, or by ’s plotting functions for sensitivity analysis, plotSA() and plotSARanks().
plotSARanks(SAresults)
Confidence intervals on index ranks.
The sensitivity indices can be visualised in several ways - one of these is as box plots.
# plot bar chart
plotSA(SAresults, ptype = "box")
A somewhat related exercise is to see what happens on removing indicators and even entire components of the index.
testresults <- removeElements(ASEM, aglev = 1, isel = "Index", quietly = TRUE)
library(ggplot2)
ggplot(data.frame(Indicator = names(testresults$MeanAbsDiff[-1]),
Impact = testresults$MeanAbsDiff[-1]),
aes(x=Indicator, y=Impact)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 45, hjust=1))
Last of all, we can generate unit reports for any country based on a template. For example, for New Zealand, run:
getUnitReport(ASEM, usel = "NZL", out_type = ".html")
Note this works also with multiple units at the same time.
You can get more info on COINr at